Excel BI - Excel Challenge 709

excel-challenges

excel-formulas

🔰 Answer Expected Data1 Data2 E A, B A, A, A B, C D, D Filter out (remove) those rows where all values get repeated in the cell in Data2 column and entire cell gets repeated in Data…

Published

March 24, 2026

Illustration for Excel BI - Excel Challenge 709

Challenge Description

🔰 Answer Expected Data1 Data2 E A, B A, A, A B, C D, D Filter out (remove) those rows where all values get repeated in the cell in Data2 column and entire cell gets repeated in Data2 column. Ex. A, A, A - All values (A here) get repeated in a cell and this entire cell gets repeated again in Data2 column.

Solutions

library(tidyverse)
library(readxl)

path = "Excel/700-799/709/709 Filter Out Repeats.xlsx"
input = read_excel(path, range = "A2:B10")
test = read_excel(path, range = "C2:D8")

result = input %>%
  filter(!(n_distinct(str_extract_all(Data2, "[A-Z]")[[1]]) == 1 & 
           str_length(Data2) != 1 & 
           n() != 1), .by = Data2) %>%
  select(Data1, Data2)

all.equal(result, test)
# TRUE

Logic: Read the workbook ranges needed for the challenge; Parse the packed text or string structure; Aggregate or rank the data at the required grouping level.
Strengths: The solution stays close to the text pattern itself, which makes the extraction logic easy to audit.
Areas for Improvement: The solution assumes the workbook layout and selected ranges remain stable, so any structural change in the sheet would require small adjustments.
Gem: A small number of well-targeted text patterns does most of the heavy lifting.

import pandas as pd
import re

path = "700-799/709/709 Filter Out Repeats.xlsx"

input = pd.read_excel(path, usecols="A:B", skiprows=1, nrows=9)
test = pd.read_excel(path, usecols="C:D", skiprows=1, nrows=6).rename(columns=lambda x: re.sub(r"\.1$", "", x))

def extract_distinct_uppercase(s):
    return len(set(re.findall(r"[A-Z]", s)))

filtered = input.groupby("Data2").filter(
    lambda group: not (
        extract_distinct_uppercase(group.name) == 1 and
        len(group.name) != 1 and
        len(group) == 2
    )
).reset_index(drop=True)
result = filtered[["Data1", "Data2"]]

print(result.equals(test)) # True

The Python version follows the same grouped logic and keeps the transformation explicit in a dataframe pipeline.

Difficulty Level

Medium

The individual steps are manageable, but the correct transformation pattern is not obvious from the raw data.